Binding of processes in network systems

ABSTRACT

Binding processes in a network system involves monitoring the status of RMI processes by running a thread associated with a parent process. Each parent process in the network system is associated with a watchdog object that initiates a thread, the thread monitoring the status of RMI processes. If the thread determines that its associated parent process is not bound with an active RMI process, the thread automatically rebinds its parent process with an active RMI process.

BACKGROUND OF THE INVENTION

[0001] 1. Field of the Invention

[0002] The present invention relates to binding processes in a networksystem. More specifically, the present invention relates to ensuringthat processes are bound to an active remote method invocation (RMI)process by monitoring the status of the RMI process.

[0003] 2. Related Art

[0004] Administration of large, multi-server, computing environments isa field of growing interest as the number and size of large,multi-server computing environments grows. The field of multi-serversystem administration and management focuses on maintaining the physicaloperation of a multitude of computer systems, often referred to asnodes, connected in a network. This task includes a number of functions,including adding, modifying and removing nodes, users, tools, and roles;defining groups of nodes; authorizing users to perform operations onnodes; installing, maintaining and configuring hardware; installing andupgrading operating system and application software; and applyingsoftware patches, among other functions.

[0005] A typical network includes a plurality of nodes, which aremanaged by a service control manager (SCM) running on a centralmanagement server (CMS). The nodes comprise a service control managercluster, and can be further organized into node groups. In a CMS, aplurality of processes, referred to as “daemons,” are employed toperform tasks essential to run the network. The daemons are processesthat perform tasks such as logging management actions by the SCM,managing users, and monitoring tasks assigned to nodes.

[0006] The daemons performing the above tasks may be located ondiffering JAVA® virtual machines (JVM), and remote method invocation(RMI) daemons are run in the network to allow daemons to communicatewith one another. The RMI daemons serve as locators for daemons in thenetwork, with agent daemons on each node accessing the RMI daemons inorder to determine the network address, or universal resource locator(URL), for daemons in the network. A daemon in the network becomesaccessible to users or other daemons by registering its URL in a URLlist of an RMI daemon. This is commonly referred to as the daemon“binding” with the RMI daemon.

[0007] In conventional networks, if an RMI daemon becomes inactive forsome reason, functioning daemons (and other processes) in the networkremain bound to the inactive RMI daemon. In this case, it is notpossible to communicate with the daemons bound to the inactive RMIdaemon, because active RMI daemons would not include these daemons intheir URL lists. In response to this situation, the network systemrestarts the daemons bound with the inactive RMI daemon. When thedaemons restart, they are required to go through the process ofregistering with a new, active RMI daemon, which is time-consuming andintroduces delay into the operation of the network.

[0008] Therefore, a need exists for a method of binding processes in anetwork that does not require restarting all of the processes bound withan RMI process when the RMI process becomes inactive.

SUMMARY OF THE INVENTION

[0009] The present invention overcomes the shortcomings of conventionalmethods and devices and may achieve further advantages not contemplatedby conventional methods and devices.

[0010] According to a first aspect of the invention, processes in anetwork are each associated with a corresponding object, each objectbeing capable of initiating a thread for monitoring the status of RMI.Processes having such an associated object are referred to as “parentprocesses.” According to an embodiment of the invention, a method ofbinding the parent processes comprises binding a parent process with anRMI process, and calling an object associated with the parent process,the object initiating a thread. The thread performs the steps ofmonitoring the status of RMI processes, and rebinding the parent processwith an active RMI process when the object determines that its parentprocess is not bound with an active RMI process.

[0011] According to the first aspect of the invention, parent processesin a network system need not be restarted when an RMI process becomesinactive, and may instead be automatically rebound with an active RMIprocess by the thread. Automatic rebinding of the parent process avoidsdelay and inconvenience to users of the network.

[0012] Other aspects and advantages of embodiments of the invention willbe discussed with reference to the figures and to the detaileddescription of the preferred embodiments.

BRIEF DESCRIPTION OF THE FIGURES

[0013]FIG. 1 illustrates a network system according to an embodiment ofthe present invention.

[0014]FIG. 2 illustrates a portion of a network according to anembodiment of the present invention.

[0015]FIG. 3 is a flow chart illustrating the startup of processes and awatchdog thread associated with parent processes.

[0016]FIG. 4 illustrates the operation of a watchdog thread associatedwith a particular parent process.

[0017]FIG. 5 is a sequence diagram illustrating the operation of awatchdog thread for a parent process.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

[0018] A network system and a method for binding processes in a networksystem according to the present invention will be described below by wayof preferred embodiments and with reference to the accompanyingdrawings.

[0019]FIG. 1 illustrates an exemplary network system 10 according to anembodiment of the present invention. The network system 10 comprises anSCM 12 running on a CMS 14, and a plurality of remote nodes 16 managedby the SCM 12 on the CMS 14. Together, the plurality of remote nodes 16managed by the SCM 12 make up an SCM cluster 17. A group of remote nodes16 may be further organized into node groups 18.

[0020] The CMS 14 may be, for example, an HP-UX 11.x server running theSCM 12 software. The CMS 14 includes a memory (not shown), a secondarystorage device 141, a processor 142, an I/UX server 32, an input device(not shown), a display device (not shown), and an output device (notshown). The memory, a computer readable medium, may include RAM orsimilar types of memory, and it may store one or more applications forexecution by the processor 142, including the SCM 12 software. Thesecondary storage device 141 includes a data repository 26 for the SCMcluster 17, and a depot 30. The secondary storage device 141 maycomprise a hard disk drive, a floppy disk drive, a CD-ROM drive, andother types of non-volatile data storage media. The CMS 14 also includesa web server 28 that allows web access to the SCM 12.

[0021] The processor 142 executes the SCM 12 software and otherapplications, which are stored in memory or in the secondary storagedevice 141, or received from the Internet or, in general, from anothernetwork 24. The SCM 12 may be programmed in Java®, and may operate in aJava® environment. Java is an object-oriented program, and objectsoperating in a Java® Virtual Machine (“JVM”) provide the functionalityof the SCM 12. Object-oriented programming is a method of programmingthat pairs programming tasks and data into re-usable chunks known asobjects—each object comprising attributes (i.e., data) that define anddescribe the object. Java classes are meta-definitions that define thestructure of a Java object. Java classes, when instantiated, createinstances of the Java classes and are then considered Java objects. Adetailed description of SCM is provided in, for example, ServiceControlManager Technical Reference, HP part number: B8339-90019, which ishereby incorporated by reference, and which is available fromHewlett-Packard Company.

[0022] Generally, the SCM 12 supports managing an SCM cluster 17 fromthe CMS 14. Tasks performed on the SCM cluster 17 are initiated on theCMS 14 either directly or remotely, for example, by reaching the CMS 14via a web connection 20. Therefore, a workstation 22 at which a usersits needs only the web connection 20 over the network 24 to the CMS 14,in order to perform tasks on the SCM cluster 17.

[0023]FIG. 2 illustrates a portion of the network system 10 according toan embodiment of the present invention. FIG. 2 illustrates the CMS 14,and one of the remote nodes 16 of the network system 10.

[0024] In the exemplary embodiment illustrated by FIG. 2, the functionsof the SCM 12 are divided into a plurality of separate, long running,independently executing processes, which are referred to in theterminology of the UNIX systems community as “daemons.” FIG. 2 showsfour such processes running on the CMS 14: a Distributed Task Facility(for example, an ADTF@ process) process 210; a Log Manager process 212;a Domain Manager process 215; and an RMI process 205. For convenience, aprocess run by the CMS 14 can be generally referred to as a “managementdaemon,” if the process is a daemon, or, more generally, as a“management process.”

[0025] The Log Manager process 212 performs all of the functions of theSCM 12 necessary to maintain a log of the system management actionstaken by the SCM 12. The log serves as an audit trail permitting anaccounting of each step of each task performed by the SCM 12 on any ofthe nodes 16, node groups 18, or the SCM cluster 17, as well as on theCMS 14 itself. The Domain Manager process 215 performs the functions ofthe SCM 12 relating to the management of users and user groups on theSCM cluster 17. The Distributed Task Facility process 210 handles theassignment and monitoring of tasks assigned to be performed on each ofthe remote nodes 16. The RMI process 205 may be a JAVA® RMI process. Anyof the processes 205, 210, 212, 215, 230 may be daemons.

[0026] Additional or different combinations of processes may be includedin the CMS 14, and the configuration illustrated by FIG. 2 is intendedto be exemplary.

[0027] The remote node 16 is illustrated as running a JAVA® RMI process250, and an SCM Agent process (for example, an ASCM Agent@ process) 230.The remote node 16 is illustrated as running the SCM agent process 230and the RMI process 250. The CMS 14 also includes an SCM Repository 220.The RMI process 250 allows the processes 210, 212, 215, 230, which maybe started in their own JVMs, to communicate with each other, eventhough they are in different JVMs.

[0028] In SCM environments such as those illustrated in FIG. 2, the RMIprocess 250 acts as an index, or locator. When one or more processes,such as the processes 210, 212, 215, 230, are started, the RMI process250 stores the URL and object interface of each process that requiresRMI functionality. The RMI process can respond to any process that maybe looking for another one of the registered processes on the remotemanaged node 16 where the process can be found. For example, when theDTF process 210 needs to communicate with the agent process 230 toinstruct it to perform an operation, it connects with the RMI process250 and asks the RMI process 250 where the SCM agent process 230, of thegiven URL is located. The RMI process 250 responds with the interfaceobject of the SCM agent process 230. The DTF process 210 may thenconnect with the SCM agent process 230 and communicate with it directly.

[0029] Before a process can be accessed in a network system, it must beregistered with an active RMI process. A process registers with an RMIprocess by calling an RMI process initiated by an RMI object, which canbe, for example, a JAVA® naming (“Naming”) object, and providing its URLand interface object to the RMI process. In this manner, the processbecomes “bound” with the RMI process, and other processes, users, orother entities, may then access the process through the RMI process.Each unique machine that has a process operating in a JVM requires anRMI process to be present.

[0030] Difficulties arise in conventional networks when an RMI processservicing a node becomes inactive for some reason, because processesbound with the inactive RMI process would not be locatable. Instead, anattempt to access a process bound with an inactive RMI process wouldresult in contact with an active RMI process that does not include therequested process in its bound URL list. Conventional networks resolvethis problem by an inefficient restart (automatically performed by theoperating system) of the processes bound with the inactive RMI process,so that the registered processes can again bind (or, “rebind”) with theactive RMI process.

[0031] The present invention overcomes the above shortcomings ofconventional networks and achieves further advantages. According to anembodiment of the present invention, a process may be associated withits own object, which may be referred to as a “watchdog object,” thewatchdog object serving to initiate a thread, which may be convenientlyreferred to as a “watchdog thread.” The watchdog thread monitors thestatus of RMI processes in order to determine whether the watchdogobject's associated process is currently bound with an active RMIprocess in the network system 10. The watchdog thread acts to rebind itsassociated process with an active RMI process when the RMI process towhich it is bound is no longer active. This function obviates the needto restart all of the processes bound with an RMI process when the RMIprocess becomes inactive. The network system 10 therefore operates moreefficiently because processes become accessible as soon as an active RMIprocess becomes available to register the processes.

[0032] Processes in the network system 10 including an associatedwatchdog object may be conveniently referred to as “parent processes.”Similarly, a daemon process including an associated watchdog object maybe referred to as a “parent daemon.” The watchdog thread may be employedin any process in the network system 10 that relies on RMI tocommunicate with other processes or users in the network system 10.Processes that may employ the watchdog thread include, for example, theprocesses 210, 212, 215, 230.

[0033] In FIG. 2, the functions of the SCM 12 are divided into separateprocesses to improve the reliability of the network. The configurationin FIG. 2, however, is merely illustrative, and other SCM networkconfigurations employing RMI processes are also suitable for use withthe present invention.

[0034] The operation of the watchdog thread will now be discussed withreference to FIG. 3. FIG. 3 is a flow chart illustrating the startup ofthe parent processes in the network system 10, and the startup ofwatchdog threads associated with parent processes in the network system10.

[0035] In step S10, an RMI process is started. The RMI process can bestarted during installation of the SCM 12, or when other processes inthe network are started. The other SCM processes in the network system10 are then started in step S12.

[0036] In step S14, a watchdog object is called for each parent process,which initiates a watchdog thread for each parent process. In general,each parent process performs a method call to a watchdog object, whichinitiates a watchdog thread for that parent process. The watchdog threadmonitors the status of the RMI process in order to determine whether theRMI process has registered its parent process. The operation of thewatchdog thread, including the initialization call, will be discussed infurther detail with reference to FIGS. 4 and 5.

[0037] Step S16 illustrates the termination of a parent process. Asdiscussed with reference to FIG. 4, if the terminated process is aparent process, the watchdog thread for the parent process may then beterminated, as its function is no longer required. One or more parentprocesses may be terminated to, for example, perform maintenance on theSCM 12.

[0038]FIG. 4 illustrates the operation of a watchdog thread associatedwith a particular parent process. In step S18, the watchdog threadobtains a bound URL list from the RMI process via a list call. In stepS20, the watchdog thread then determines whether its parent process'sname is in the bound URL list of the RMI process. If the watchdogthread's parent process URL is in the bound URL list (i.e., the parentprocess is bound, or registered, with the RMI process) the watchdogthread returns to step S18, and periodically monitors the status of theRMI process for the presence of the parent process URL in the bound URLlist.

[0039] If the watchdog thread's parent process URL is not in the boundURL list (i.e. the parent process is not bound, or registered, with theRMI process) the watchdog thread requests the RMI process to bind (via arebind call) the parent process URL with the current, active RMI process(step S22). The parent process URL may be absent from the bound RMI listof an active RMI process if, for example, the RMI process to which theparent process was bound became inactive for some reason.

[0040] Because the parent process is now bound with the active RMIprocess, users, daemons, and other processes attempting to access theparent process can now communicate with the parent process. If theparent process were not rebound with an active RMI process, the activeRMI process would report that the parent process was not bound to it,and the parent process would not be accessible.

[0041] In step S24, it is determined whether thread termination has beenrequested. The watchdog thread may be terminated, for example, when itsparent process has been terminated.

[0042]FIG. 5 is a sequence diagram illustrating the operation of awatchdog thread for a parent process. In the exemplary embodimentillustrated by FIG. 5, the parent process is initiated by an object ofclass daemonImpl. In addition to an object, the term “daemonImpl” canrepresent the implementation of a daemon or other process.

[0043] The sequence diagram begins at the object named dtf of classdaemonImpl, illustrated as dtf:DaemonImpl 300 in FIG. 5. The daemonImplobject dtf 300 initiates a parent process, having an associated object304, named dtf, of class watchdog. The daemonImpl object dtf 300 firstperforms a synchronous rebind call to the RMI process initiated by theRMI object 302, which may include, for example, a JAVA® naming object.In the rebind call, the daemonImpl object dtf 300 provides its URL andinterface object to the RMI process, thereby binding the daemonImplobject dtf 300 information with an active RMI process.

[0044] Once the daemonImpl object dtf 300 is bound with an active RMIprocess, the daemonImpl object dtf 300 performs an asynchronousinitialize (init) call its associated watchdog object, dtf:Watchdog 304.Calling the watchdog object 304 starts a watchdog thread for thedaemonImpl object dtf 300. The watchdog thread is illustrated asextending from the bottom of the watchdog object 304.

[0045] The watchdog thread includes a loop 308, in which a synchronouslist call is performed in order to determine whether the URL of theparent process is in the bound URL list of an active RMI process. If theparent process URL is not listed with an active RMI process, thewatchdog thread performs a rebind call to the RMI process in order torebind the parent process with the active RMI process. The watchdogthread continues to perform list calls as long as the watchdog threadhas not been terminated.

[0046] According to the above embodiment of the invention, if an RMIprocess becomes inactive for some reason, each parent process running awatchdog thread can quickly rebind with an active RMI process.Therefore, it is not necessary to restart every process uponinactivation of the RMI process.

[0047] The above sequence was described with reference to a parentprocess initiated by the daemonImpl object dtf 300, however theprinciples of the present invention apply to any daemon or other processhaving an associated object for generating a watchdog thread.

[0048] The steps of the above embodiments can be implemented withhardware or by execution of programs, modules or scripts. The programs,modules or scripts can be stored or embodied on one or more computerreadable mediums in a variety of formats, such as source code, objectcode or executable code, for example. The code can be implemented in theJava® programming language, as described above, or in other programminglanguages. The computer readable mediums may include, for example, bothstorage devices and signals. Exemplary computer readable storage devicesinclude conventional computer system RAM (random access memory), ROM(read only memory), EPROM (erasable, programmable ROM), EEPROM(electrically erasable, programmable ROM), and magnetic or optical disksor tapes. Exemplary computer readable signals, whether modulated using acarrier or not, are signals that a computer system hosting or runningthe described methods can be configured to access, including signalsdownloaded through the Internet or other networks.

[0049] The terms and descriptions used herein are set forth by way ofillustration only and are not meant as limitations. Those skilled in theart will recognize that many variations are possible within the spiritand scope of the invention as defined in the following claims, and theirequivalents, in which all terms are to be understood in their broadestpossible sense unless otherwise indicated.

What is claimed is:
 1. A method of binding processes in a networksystem, the method comprising: binding a parent process with a remotemethod invocation process; and calling an object associated with theparent process, the object initiating a thread to perform the steps of:monitoring the status of remote method invocation processes; andrebinding the parent process with an active remote method invocationprocess when the thread determines that its parent process is not boundwith an active remote method invocation process.
 2. The method of claim1, wherein the binding step comprises: providing a network address ofthe parent process to the active remote method invocation process. 3.The method of claim 1, wherein the monitoring step comprises: performinga list call to an active remote method invocation process to determinewhether the parent process is bound to an active remote methodinvocation process.
 4. The method of claim 3, wherein the rebinding stepcomprises: performing a rebind call to an active remote methodinvocation process.
 5. The method of claim 1, wherein the monitoringstep comprises: calling an active remote method invocation process todetermine whether the parent process network address is registered withan active remote method invocation process.
 6. The method of claim 1,comprising: binding a second parent process with a remote methodinvocation remote method invocation process; and calling a second objectassociated with the second parent process, the second object initiatinga second thread to perform the steps of: monitoring the status of remotemethod invocation processes; and rebinding the second parent processwith an active remote method invocation process when the second threaddetermines that the second parent process is not bound with an activeremote method invocation process.
 7. The method of claim 1, wherein thestep of binding a parent process comprises: binding one of an RMIdaemon, a distributed task facility daemon, a log manager daemon, or adomain manager daemon, with an active RMI daemon.
 8. The method of claim1, comprising: terminating the thread when the parent process isterminated.
 9. A network system, comprising: a plurality of remotenodes, at least one of the remote nodes running a remote methodinvocation process; and a management server for managing the remotenodes, the management server including at least one processor forrunning an remote method invocation process and at least one managementprocess, each at least one management process being associated with anobject capable of initiating a thread to perform the steps of:monitoring the status of remote method invocation processes; andrebinding the at least one management process with an active remotemethod invocation process when the thread determines that the at leastone management process is not bound with an active remote methodinvocation process.
 10. The network system of claim 9, wherein the atleast one management process comprises a plurality of managementprocesses.
 11. The network system of claim 9, wherein the plurality ofmanagement processes comprise: a distributed task facility process; adomain manager process; and a log manager process.
 12. The networksystem of claim 9, wherein each of the remote nodes runs a servicecontrol manager agent process for performing server management tasks.13. The network system of claim 9, wherein the management servercomprises: a secondary storage device, the secondary storage devicecomprising: a data repository; a depot; and a web server.
 14. Thenetwork system of claim 9, wherein the plurality of remote nodes arearranged into at least one node group, the network system comprising aservice control manager for managing the at least one node group.
 15. Amethod of binding a parent process to an remote method invocationprocess, the method comprising: a) performing a rebind call to an remotemethod invocation process to provide a network address and an objectinterface of a parent process to the remote method invocation process;and b) performing an initialization call to an object associated withthe parent process, the initialization call initiating a thread, thethread performing the steps of: 1) performing a list call to an activeremote method invocation process to determine whether the parent processis bound with the active remote method invocation process; 2) performinga rebind call to an active remote method invocation process if thethread determines that the parent process is not bound with an activeremote method invocation process; and 3) repeating steps 1 and
 2. 16.The method of claim 15, wherein the parent process is one of a remotemethod invocation daemon, a distributed task facility daemon, a logmanager daemon, and a domain manager daemon.
 17. The method of claim 15,wherein the step of performing a rebind call includes the step ofperforming a rebind call to a remote method invocation daemon.
 18. Themethod of claim 15, wherein the step of performing a list call includesthe step of performing a list call to a remote method invocation daemon.19. The method of claim 15, comprising: terminating the thread when theparent process is terminated.